Deadwood Detection and Elimination in Text Summarization for Punjabi Language

نویسندگان

  • Mandeep Kaur
  • Jagroop Kaur
چکیده

As the internet is growing rapidly, this has resulted in large amount of information. Text summarization provides shorthand version for such information, which is no longer than half of the original text. This paper proposes a system for detection and removal of Deadwood in summaries for Punjabi language. Deadwood means word or phrase that can be omitted without loss in meaning. Removing it shortens and clarifies the summary. The first step in this process is preprocessing which consists of sentence segmentation and removal of Punjabi stop words and then in the second step weight is assigned to the sentences in the source text .We used five different features for the assignment of weight to the sentences. In the next step the highest scoring sentences are selected to form the summary. In the last step the Deadwood is eliminated and removed from the summary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Punjabi Text Extractive Summarization System

Text Summarization is condensing the source text into shorter form and retaining its information content and overall meaning. Punjabi text Summarization system is text extraction based summarization system which is used to summarize the Punjabi text by retaining relevant sentences based on statistical and linguistic features of text. Punjabi text summarization system is available online at webs...

متن کامل

Complete Pre Processing Phase of Punjabi Text Extractive Summarization System

Text Summarization is condensing the source text into shorter form and retaining its information content and overall meaning. Punjabi text Summarization system is text extraction based summarization system which is used to summarize the Punjabi text by retaining the relevant sentences based on statistical and linguistic features of text. It comprises of two main phases: 1) Pre Processing 2) Pro...

متن کامل

Automatic Text Summarization System for Punjabi Language

This paper concentrates on single document multi news Punjabi extractive summarizer. Although lot of research is going on in field of multi document news summarization systems but not even a single paper was found in literature for single document multi news summarization for any language. It is first time that this system has been developed for Punjabi language and is available online at: http...

متن کامل

Maximum Entropy Approach based Named Entity Recognition in Punjabi Language

Named Entity Recognition is the task of identifying and classifying named entities into some predefine categories like person, location, organization etc. NER is used in many applications like text summarization, text classification, question answering and machine translation systems etc. For English a lot of work has already been done in the field of NER, where capitalization is a major key fo...

متن کامل

Identification and Separation of Complex Sentences from Punjabi Language

Complex sentences constitute major parts of the Punjabi language. All the large sentences are either of compound or of complex type. Detail analysis of complex sentences is helpful in processing the Punjabi language through computer. This study will be helpful in identifying and separating the complex sentences from Punjabi corpus. Also this study will be helpful in developing other NLP applica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013